Problem Note 44478: Text Miner or Text Parsing node results might be incorrect if a document contains characters that can be interpreted as HTML tags
If you use Text Parsing or Text Miner nodes in SAS® Text Miner, then the results might be incorrect. The problem occurs when a document contains characters that can be interpreted as HTML tags. The nodes might incorrectly remove text. As a result, some terms are missing from the analysis.
There is no general workaround to the problem.
There are no errors or warnings that indicate that text was removed incorrectly.
One specific example involves a less-than symbol (<) in a document. If a greater-than symbol (>) appears later in the document, then the nodes might incorrectly interpret the text between those two symbols as HTML. The text is stripped from the document.
To work around this specific example, remove less-than symbol / greater-than symbol pairs from your document. For example, use an editor to replace "<" with "lt", and to replace ">" with "gt".
Operating System and Release Information
SAS System | SAS Text Miner | Tru64 UNIX | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
64-bit Enabled AIX | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
64-bit Enabled HP-UX | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
64-bit Enabled Solaris | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
ABI+ for Intel Architecture | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
AIX | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
HP-UX | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
HP-UX IPF | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Linux | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Linux for x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows Millennium Edition (Me) | 4.1 | | | |
Windows Vista | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
z/OS | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft® Windows® for x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Ultimate 32 bit | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Professional x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Home Premium x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Home Premium 32 bit | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Enterprise 32 bit | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Enterprise x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows XP Professional | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows Server 2008 for x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows Server 2003 for x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows Server 2008 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows Server 2003 Standard Edition | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows Server 2003 Enterprise Edition | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows Server 2003 Datacenter Edition | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows 2000 Professional | 4.1 | | | |
Microsoft Windows 2000 Server | 4.1 | | | |
Microsoft Windows 2000 Datacenter Server | 4.1 | | | |
Microsoft Windows 95/98 | 4.1 | | | |
Solaris | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows Vista for x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Ultimate x64 | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Windows 7 Professional 32 bit | 4.1 | 5.1_M1 | | 9.3 TS1M1 |
Microsoft Windows NT Workstation | 4.1 | | | |
Microsoft Windows 2000 Advanced Server | 4.1 | | | |
*
For software releases that are not yet generally available, the Fixed
Release is the software release in which the problem is planned to be
fixed.
Type: | Problem Note |
Priority: | alert |
Date Modified: | 2020-04-02 10:51:16 |
Date Created: | 2011-09-30 08:16:22 |